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PREFATORY  NOTE 


This  paper  is  based  on  a presentation  given  at  the 
19th  Conference  of  the  Military  Testing  Association, 
October  17-21,  1977,  at  San  Antonio,  Texas.  The  con- 
ference was  hosted  by  the  Air  Force  Human  Resources 
Laboratory  and  the  Air  Force  Occupational  Measure- 
ment Center. 

Dr.  Albert  L.  Kubala,  the  paper’s  author,  is  a 
Senior  Staff  Scientist  in  HumRRO’s  Western  Division. 
He  is  presently  heading  a team  of  HumRRO  scientists 
conducting  research  for  the  Department  of  the  Army 
at  Fort  Hood,  Texas.  The  information  presented  in 
this  paper  was  developed  in  the  course  of  research 
accomplished  in  Project  HOOD,  “Human  Factors 
Research  in  Military  Organizations  and  Systems.” 


PROBLEMS  IN  MEASURING  TEAM  EFFECTIVENESS 


Albert  L.  Kubala 

Human  Resources  Research  Organization.'  Fort  Hood,  Texas 

Background 

Borrowing  heavily  on  characteristics  of  teams  described  by  Glaser,  Klaus,  and 
Egerman,^  as  well  as  Hall  and  Rizzo,’  Wagner,  Hibbits,  Rosenblatt,  and  Schulz^  define 
team  training  as: 

SLIDE  1 

The  training  of  two  or  more  individuals  who 
are  associated  together  in  work  or  activity. 

The  team  is  relatively  rigid  in  structure  and 
communication  pattern.  It  is  goal-  or  mission- 
oriented  with  the  task  of  each  team  member 
well-defined.  The  functioning  of  the  team 
depends  upon  the  coordinated  participation  of 
all  or  several  individuals.  The  focus  of  team 
training  and  feedback  is  on  team  skills  (e.g., 
coordination),  activities  and  products. 

It  can  be  seen  from  the  implied  definition  of  a team,  that  a team  could  be  Cv  nposed  of 
anything  from  a two-man  crew  to  a unit  of  almost  any  size.  However,  most  of  the  litera- 
ture dealing  with  teams  has  considered  relatively  small  units  such  as  those  associated  with 
one  piece  of  equipment,  such  as  a tank  or  aircraft,  or  at  most,  a platoon  with  a single 
objective  or  mission.  Wagner,  et  al.,  further  points  out  that,  while  the  military  services 
conduct  up  to  90%  of  their  training  in  the  operational  commands,  most  training  research 
has  been  focused  on  individual  training  in  institutional  settings.  For  example,  in  FY  1974, 
the  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences  (ARI)  initiated  the 
largest  program  of  unit  training  and  evaluation  research  in  history.  Yet,  only  11%  of 
the  human  resources  budget  was  spent  in  this  area.  Judging  from  the  literature,  the 


‘This  work  was  performed  under  Contract  DAHC19-75-C-0025  to  the  US  Army  Research  Insti- 
tute for  the  Behavioral  and  Social  Sciences  (ARI).  Dr.  Charles  O.  Nystrom  was  the  Contract  Monitor. 

’R.  Glaser,  D.J.  Klaus,  and  K.  Egerman.  Increasing  team  proficiency  through  training:  2.  The 
acquisition  and  extinction  of  a team  response.  Technical  Report  AIR  B64-5/62,  American  Institutes 
for  Research,  May  1962. 

’E.R.  Hall  and  W.A.  Rizzo.  An  assessment  of  US  Navy  tactical  team  training:  Focus  on  the 
trained  man,  TAEG  Report  No.  18,  Training  Analysis  and  Evaluation  Group,  March  1975. 

^H.Wagner,  N.  Hibbits,  R.D.  Rosenblatt,  and  R.  Schulz.  Team  training  and  e'vluation  strategies 
State-of-the-art,  Technical  Report  77-1,  Human  Resources  Research  Organization,  Alexandria,  Virginia, 
February  1977. 
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resources  devoted  to  this  area  by  the  other  military  services  has  been  roughly  comparable. 
This  lack  of  emphasis  seems  strange  in  view  of  the  fact  that  most  fighting  has  been,  and 
will  continue  to  be  done  by  teams.  It  now  seems  critical  that  we  determine  how  well 
our  teams  do  function,  for  as  MG  Gorman  has  stated,  we  must: 
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. . . train  the  Army  to  win  on  the  first  battle- 
field of  the  next  war  against  an  enemy  that 
outnumbers  us,  against  an  enemy  whose  weapons 
will  be  as  good  as  or  nearly  as  good  as  those 
we  possess.  . . .' 

In  other  words,  we  can  ill  afford  any  but  the  most  effective  fighting  teams.  And,  to 
ensure  maximum  effectiveness.  Measures  of  Effectiveness  (MOE)  must  be  derived  so  that 
commanders  can  evaluate  their  own  teams,  discover  deficiencies,  and  take  corrective 
measures. 

Our  HumRRO  contingent  at  Fort  Hood  became  involved  in  this  area  when  we  were 
asked  to  determine  what  set  of  MOE  were  currently  being  employed  to  evaluate  tank 
crews,  and  to  determine  what  additional  research  was  needed  to  ensure  a comprehensive 
evaluation  capability.  We  soon  found  that  for  all  practical  purposes,  the  only  MOE  in 
current  use  are  scores  on  Table  VIII,  otherwise  known  as  the  Tank  Crew  Qualification 
Course.’  For  those  of  you  unfamiliar  with  Table  VIII,  it  should  suffice  for  the  moment 
to  know  that  it  is  a live-fire  gunnery  exercise,  where  crews  are  scored  on  both  hit 
accuracy  and  times  to  engage  targets.  Looking  further  at  this  MOE,  we  were  surprised 
to  find  that  the  reliability  of  Table  VIII  scores  has  apparently  never  been  determined, 
and  that  many  question  its  validity  as  a predictor  of  combat  effectiveness.  We  wondered 
why  no  other  MOE  were  in  use,  and  why  one  which  was  somewhat  suspect  was  in 
general  use.  We  wondered  what  the  problem(s)  was(were).  Therefore,  we  decided  that 
the  next  step  should  be  a study  of  the  problems  eissociated  with  the  development  and 
use  of  team  evaluations,  which  is  the  subject  of  this  paper. 

I doubt  that  anything  I say  will  really  be  new  to  any  of  you.  My  purpose  in 
presenting  this  paper  is  simply  to  re-focus  your  collective  attention  on  these  problems. 

I feel  that  the  areas  of  team  training  and  evaluation,  especially  evaluation,  have  been 
much  neglected.  Hopefully,  this  presentation  will  generate  some  interest  in  and  lead 
some  of  you  toward,  solutions  for  some  of  the  problems  I will  discuss.  We  have  pain- 
stakingly developed  procedures  for  building  training  programs  and  evaluating  individuals. 
We  have  out  inter-service  procedures  for  instructional  systems  development,’  and  are 
now,  in  the  Army,  developing  individual  Skill  Qualification  Tests  (SQTs).  These  tests 
will  be  designed  to  test  actual  job  performance  as  well  as  knowledge,  and  successful 
performance  will  be  a prerequisite  for  both  retention  and/or  promotion.  However,  we 
have  no  similar  procedures  for  either  curriculum  development  or  evaluation  of  teams, 
and  they  are  sorely  needed. 

' W.E.  DuPuy  and  P.F.  Gorman.  “TRADOC  mission  and  resources  briefing,”  transcript  from  TV 
tape,  US  Army  Training  and  Doctrine  Command,  Fort  Monroe,  Virginia. 

’j.A.  Larson,  W.K.  Earl,  and  V.A.  Henson.  Assessment  of  US  tank  crew  training,  TCATA  Test 
Report  No.  FM  331,  Final  Report  (23  March  75  - 13  March  76),  HQ,  TRADOC  Combined  Arms  Test 
Activity,  Fort  Hood,  Texas,  July  1976. 

’tRADOC  PAM  350-30.  Interservice  procedures  for  instructional  systems  development,  US  Army 
Training  and  Doctrine  Command,  Fort  Monroe,  Virginia,  1 August  1975, 


Problems 


The  particular  problems  which  I have  chosen  for  further  elaboration  are  shown  in 
the  next  slide. 
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• Defining  Effectiveness 

• Defining  Team  Effectiveness 

• Problems  With  Numbers 

• Reliability 

• Evaluation  Strategies 

• Resources 

Defining  effectiveness.  Historically,  MOE  were  derived  to  ensure  the  quality  of 
newly  developed  hardware.  For  one  of  our  simplest  weapons— the  rifle— accuracy  was 
the  original  MOE.  Somewhat  later,  rate  of  fire  was  added  as  an  MOE.  Still  later,  it 
was  realized  that  a highly  accurate  rapid  fire  weapon  was  of  little  value  unless  it  were 
completely  functional.  Therefore,  the  concept  of  “availability”  came  into  being  as  an 
MOE,  and  was  measured  by  such  things  as  Mean  Time  Between  Failure  (MTBF)  and 
Mean  Time  to  Repair  (MTTR).  However,  the  primary  reason  for  the  proliferation  of 
MOE  was  the  recognition  that  effectiveness  was  mission-dependent.  For  example,  the 
weapon  characteristics  desirable  for  a sniper  rifle  are  quite  different  from  those  required 
for  a weapon  designed  primarily  for  suppression.  In  selecting  a rifle,  a sniper  would  be 
primarily  interested  in  accuracy  and  range,  but  would  not  be  too  concerned  about  rate 
of  fire.  On  the  other  hand,  the  soldier  with  the  suppression  mission  would  be  very  con- 
cerned with  rate  of  fire,  but  not  too  concerned  with  accuracy. 

An  actual  example  from  history  serves  to  further  illustrate  the  problems  in  defining 
effectiveness  and  the  necessity  to  consider  the  mission  in  selecting  MOE.  In  the  early 
phases  of  WWII,  a great  many  British  merchant  vessels  were  damaged  or  even  destroyed 
by  aircraft  attacks.  As  a consequence,  merchant  vessels,  were  equipped  with  antiaircraft 
guns  and  crews.  After  a period  of  time  it  was  discovereij  that  only  4%  of  the  attacking 
enemy  aircraft  were  actually  shot  down.  This  led  some  to  conclude  that  the  systems 
were  ineffective  on  ships  and  could  be  better  employed  elsewhere,  where  kill  rates  were 
higher.  Employing  this  MOE,  the  decision  seemed  inevitable.  However,  further  exami- 
nation of  the  data  revealed  that  the  antiaircraft  fire  greatly  reduced  the  lethality  of  the 
enemy  attack.  In  fact,  the  inclusion  of  antiaircraft  weapons  virtually  halved  the  proba- 
bility that  a ship  would  be  sunk.  Viewed  in  this  light,  the  systems  were  considered 
highly  effective.  In  other  words,  the  selection  of  the  wrong  MOE,  or  the  exclusion  of 
critical  MOE,  can  lead  to  the  wrong  decision  about  effectiveness. 

One  further  point  needs  to  be  emphasized.  Training  authorities  and  evaluators 
are  not  generally  interested  in  the  same  kinds  of  MOE  as  hardware  developers.  The 
hardware  is  developed  and  fielded  long  before  they  get  into  the  act.  They  must  train 
personnel  to  use  the  equipment  as  it  is,  and  must  evaluate  the  effectiveness  of  the  com- 
bination of  the  man  and  machine  system.  It  matters  little  if  a bench-fired  weapon 
places  100  consecutive  rounds  within  a 6-inch  circle  at  1000  meters,  if,  a typical  user 
cannot  hit  a stationary  enemy  at  50  meters  when  employing  the  weapon.  When  eval- 
uating training  or  unit  readiness,  the  mission  to  be  accomplished  must  be  considered 
and  the  criteria  of  success  must  be  set  realistically  in  terms  of  the  potential  for  man/ 
machine  effectiveness.  Unfortunately,  written  guidance  for  the  evaluator  to  aid  him  in 
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Defining  team  effectiveness.  One  of  the  major  problems  associated  with  the  evalu- 
ation of  team  effectiveness  has  been  the  inability  of  investigators  to  agree  on  what 
differentiates  team  and  individual  tasks.  Most  investigators  agree  that  it  is  wasteful  of 
effort  to  measure  performance  in  a team  context  when  the  performance  is  actually 
nothing  more  than  an  aggregate  of  individual  performances.  Individual  job  skills  can 
almost  always  be  measured  more  easily,  completely  and  cost  effectively  through  individ- 
ual job  performance  tests.  It  is  felt  that  measurement  of  performance  in  a team  context 
should  be  reserved  for  only  those  tasks  which  are  truly  team  tasks;  that  is,  tasks  which 
require  cooperation  or  coordination  to  the  extent  that  skills  must  be  practiced  in  a team 
situation  in  order  to  be  optimized. 

Hall  and  Rizzo  characterized  tasks  performed  by  teams  as  being  in  either  “established” 
or  “emergent”  situations.  In  established  task  situations,  the  sequence  of  task  perform- 
ance and  the  activities  involved  can  be  almost  completely  specified.  Also,  the  assignment 
of  task  functions  among  team  members  and  the  equipment  they  operate  are  virtually 
Hxed.  In  emergent  situations,  decision-making,  problem-solving  and  sharing  come  to 
the  forefront.  The  sequence  of  operations  is  not  fixed,  and  the  allocation  of  functions 
is  variable.  Hall  and  Rizzo  essentially  conclude  that  tasks  performed  in  established  situa- 
tions are  not  really  team  tasks.  Rather,  overall  task  performance  is  simply  the  sum  of 
the  performances  of  the  individual  team  members.  Therefore,  tasks  performed  in 
established  situations  should  not  be  evaluated  in  a team  context. 

Unfortunately,  in  discussing  various  tasks  with  knowledgeable  people  in  the  armor 
community,  I have  found  little  agreement  as  to  which  tasks  are  established  and  which  are 
emergent.  For  example,  some  have  told  me  that  firing  on  the  move  is  definitely  a team 
task.  The  advocates  of  this  position  point  to  the  need  for  precise  timing  between  the 
driver,  who  must  find  a level  spot  at  exactly  the  right  moment  and  maintain  his  direction, 
and  the  rest  of  the  crew.  Others  feel  that  any  accomplished  driver  does  this  habitually, 
and  that  so  long  as  all  crew  members  are  individually  competent,  that  the  procedures 
employed  ensure  the  proper  conduct  of  the  engagement.  I will  not  attempt  to  defend 
either  of  these  positions;  I mentioned  this  example  only  to  illustrate  the  differences  of 
opinion  I have  encountered  in  trying  to  differentiate  team  performances  from  performances 
which  are  merely  an  aggregate  of  individual  performances. 

Problems  with  numbers.  In  attempting  to  fully  describe  the  job  situations  of  a 
tank  crew  in  gunnery,  Kraemer,  Boldovici,  and  Boycan'  derived  a set  of  11  classes  of 
conditions  or  variables  that  could  affect  a crew’s  capability  to  successfully  engage  targets. 
Some  examples  of  these  classes  and  the  number  of  levels  identified  for  each  class  are 
shown  in  the  following  slide.  The  term  “levels”  refers  to  subclasses  with  a main  class. 

If  a tank  gunnery  objective  were  written  for  all  possible  combinations  of  levels,  a total 
of  1,679,616  objectives  would  result.  However,  a large  number  of  combinations  are 
unrealistic  (e.g.,  a moving  bunker)  and  were  discarded.  Judicious  combination  of  other 
levels  reduced  the  total  number  of  realistic  combinations  to  the  current  number  of  266. 

To  test  a crew’s  ability  to  perform  all  of  these  job  objectives  would  be  time-consuming, 
to  say  the  least,  and  it  must  be  remembered  that  these  objectives  cover  only  tank 
gunnery.  Obviously,  it  is  not  feasible  to  measure  job  proficiency  on  all  possible  job 
objectives.  Tests  designed  to  measure  effectiveness  will  be  able  to  address  only  a limited 
number  of  the  objectives.  However,  the  need  to  select  a limited  subset  of  job  objectives 
for  testing  is  likely  to  produce  unfortunate  results.  Training  is  almost  certain  to  be 
concentrated  on  those  areas  which  will  be  tested,  to  the  detriment  of  other  aspects  of 


' R.E.  Kraemer,  J.A.  Boldovici,  and  G.G.  Boycan,  Job  objectives  for  M60AIAOS  lank  gunnery, 
ARI  Research  Memorandum  76-9,  Human  Resources  Research  Organization,  April  1976. 
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CONDITIONS  AND  LEVELS  WITHIN  CONDITIONS^ 


Conditions 


Levels  Within  Conditions 


Firing  Vehicle  Motion 


Target  Visibility 


Target  Range 


Weapon  Main  Gun 

Coaxial  Machinegun 
Caliber  .50  Machinegun 

Fire  Delivery  Method  ^ Battlesight  (non-precision  for  machineguns) 

Precision 
Range  Card 

Range  Card  Lay  to  Direct  Fire 

Firing  Vehicle  Motion  Stationary 

Moving 

Target  Visibility  Visible  Without  Artificial  Light 

Visible  With  Artificial  Light 
Not  Visible 

Target  Range  <500  meters 

500-900  meters 
<900  meters 
<1100  meters 
1 1 00- 1 600  meters 
500-3200  meters 
1 1 00-2300  meters 
1100-3200  meters 
ALL 

J-A-  Boldovici,  and  G.G.  Boycan,  Jo*  ob/ectiVes  far 

M60A1AOS  tank  gunnery.  ARI  Research  Memorandum  76-9,  Human  Resources  Research  Organization,  April  1976. 

the  job.  This  might  be  avoided  by  testing  each  crew  on  only  a small  sample  of  jobs  from 
the  total  job  realm.  If  no  crew  knew  exactly  which  set  of  items  they  would  receive, 
they  could  not  slant  their  training  to  the  tests.  However,  the  development  of  test  items 
for  every  aspect  of  the  job  would  be  expensive.  Also,  the  resources  necessary  for  testing 
all  aspects  of  the  job  would  be  extensive.  In  short,  it  appears  that  we  have  too  many 
tasks  and  too  few  resources. 

Reliability.  We  can  only  hope  that  our  MOE  are  valid;  that  is,  that  they  are  indi- 
cative of  how  our  teams  would  perform  in  combat.  However,  we  usually  can  estimate 
their  reliability.  We  were  surprised,  therefore,  to  find  that  the  reliability  of  Table  VIII 
scores  has  apparently  never  been  determined.  The  only  data  located  which  even  bear 
on  the  subject  are  those  reported  by  Baerman  and  Eaton.'  They  found  a correlation  of 
r - .68  between  ratings  of  tank  commander  motivation  and  Table  VIII  scores.  This 

Eaton.  “Crew  assignment  and  training,"  Armor,  January -February 

1977^  60'63. 
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would  indicate  that  the  reliability  of  the  Table  VIII  scores  was  at  least  0 68  However 
there  were  several  differences  between  both  the  conduct  and  the  scoring  procedures  ’ 

^ aerman  and  Eaton  and  those  typically  employed.  A major  difference 

^ close-in.  after-the-fact  examination  of  the  targets 
rather  th^  by  an  observer  ndmg  the  tank.  These  investigators  found  early  in  their 
research  that  the  observer  determinations  of  hits  were  subject  to  considerable  error 
Therefore,  had  the  Table  VIII  scores  been  obtained  in  the  usual  manner,  quite  different 

S TahirVTfT^^''^  personal  feeling  is  that  the  test/retest  reliability 

of  Table  VIII  scores  d^enved  as  recommended  in  FM  17-12'  would  be  unacceptably  low. 

Tahi  Snyder  pointed  to  another  reliability-related  problem  with 

w 1 nn  "■  score.  Further  assume  that  we 

St  100  crews  whose  true  level  of  functioning  is  exactly  70%.  By  chance  47  of 

t ese  crews  would  score  less  than  70%,  and  therefore  be  misclassified  as  nonproficient. 

1 ar  y,  1 o of  the  crews  whose  true  level  of  functioning  was  only  60%  would  by 
chance,  be  misclassified  as  proficient.  Errors  of  misclassification  could  be  reduced  by 
incre^ing  the  length  of  the  test  to  improve  its  reliability.  However,  increasing  the  length 
would  alM  increase  the  resource  requirements,  and  resources  are  extremely  scarce  at 
this  point  in  our  history. 

To  recapitidate,  our  evaluations  of  tank  crews  are  currently  based  almost  entirely  on 
Table  VIII.  Yet,  Table  VIII  scores  are  of  unknLn  but  questionable 
re  labi  ity.  Because  of  this  nearly  total  reliance  on  Table  VIII,  it  is  imperative  that  its 
by  Sanre  ^ determined,  and  that  every  attempt  be  made  to  improve  its  reliability,  either 
by  changes  in  sconng  procedures  or  modifications  to  the  conduct  of  the  test  However 
to  date  I have  been  unable  to  obtain  the  necessary  support  to  conduct  a reliability  study 

Theref^rTiT  team  evaluation  procedures  in  any  other  context. 

Therefore,  I have  no  idea  whether  other  branches  in  the  Army  or  other  military  services 
face  Similar  problems,  but  I strongly  suspect  that  they  do. 

Testmptrat^.  Two  principal  issues  divide  evaluators  in  their  approaches  to 
testii^g  These  are  the  employment  of  (a)  one-  vs.  two-sided  test  situations,  and  (b)  process 
vs.  outcome  measurements.  ' ' 

pne-sided  vs.  two-sided  tests.  In  a one-sided  test,  such  as  Table  VIII  the 
examinees  face  a relatively  structured  situation  in  which  the  sequence  of  events  is  rela- 
tively  fixed  Aggressor”  forces,  if  present  at  all,  are  restricted  to  specific  preplanned 
tiyities.  In  a two-sided  test,  aggressor  forces  must  be  present  and  typically  have  few 
placed  on  their  activities.  The  advocates  of  two-sided  exercises  stress  the 
importance  of  realism,  the  opportunities  for  real-time  decision-making,  and  the  morale- 
competition.  They  also  point  out  that  the  inflexibility  of  one-sided 
tests  makes  them  easy  to  tram  and  practice  for.  Therefore,  they  feel  such  tests  provide 
only  poor  indications  of  how  the  participants  would  actually  perform  in  combat^ 
of  thJ the  one-sided  approach  to  ev  iluation  point  to  the  fact  that  repetition 
of  the  identic^  circumstances  is  virtually  impossible  in  a two-sided  test.  Therefore  no 

‘‘  impossible  to  set  ;xact 

^rfomance  stand^s  or  to  compare  the  performance  of  any  two  teams.  I should  point 
out  that  choosing  the  type  of  test  is  not  always  a problem,  for  the  type  of  data  required 
frequently  determine  the  most  suitable  type.  For  example,  if  exact  times  are  need^ 
such  as  the  time  to  fire  after  hne-of-sight  to  a target  is  achieved,  a one-sided  test  should 


”Q>  Department  of  the  Army,  Washington,  D.C.,  March  1977. 

. Steinheiser,  Jr.,  and  C.W.  Snyder,  Jr,  "Score  quality  issues  related  to  individual  and  weapon 
OcmbeTrsTr  performance  tests,”  presented  at  the  Military  Testing  Association  Conference, 
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be  employed.  Knowledge  of  the  exact  moment  the  target  appeared  would  be  virtually 
impossible  in  a two-sided  test.  One-sided  tests  are  also  generally  necessary  if  live-fire 
is  required. 

Two-sided  exercises  are  considered  essential  when  targets  must  be  generated.  For 
example,  a two-sided  exercise  would  be  necessary  if  the  MOE  were  to  be  the  ratio  of 
friendly  to  threat  casualties. 

Process  vs.  outcome  measurements.  Stated  very  simplistically,  “process” 
measurements  are  concerned  with  an  evaluation  of  all  of  the  actions  taken  during  an 
engagement,  but  are  not  particularly  concerned  with  the  final  outcome.  “Outcome” 
measurements  are  not  concern  -^  with  the  procedures  involved  or  the  progress  of  the 
engjigement,  but  only  in  who  wins  and  who  loses. 

Osborn'  is  an  advocate  of  process  measurement.  He  feels  that  to  be  useful,  a test 
must  be  diagnostic.  That  is,  it  must  provide  information  on  exactly  why  a particular 
aspect  of  performance  was  successful  or  unsuccessful.  Hammell,  Gasteyer,  and  Pesch’ 
state  the  case  for  process  evaluations  in  discussing  Advanced  Officer  (AO)  tactics  training 
as  shown  in  the  next  slide.  In  other  words,  Hammell,  et  al.  feel  that  process  is  the 
only  important  aspect  of  performance  in  craining  evaluations.  A good  decision  or  action 
may  lead  to  a poor  outcome,  but  the  decision  or  action  should  be  evaluated  on  its  own 
merits,  and  not  on  the  vagaries  of  future  actions  by  an  unpredictable  enemy. 


SLIDE  5 

. . . numerous  alternative  sequences  of  actions  may  exist, 
many  of  which  may  be  equally  plausible  for  attaining  a 
specific  objective.  The  sequence  of  actions  employed  by 
the  AO  contains  a complex  series  of  evaluations  and 
action  selections  which  are  situation  intended.  The  attain- 
ment of  the  ultimate  objective  may  often  be  irrelevant 
to  the  evaluation  of  the  AO's  performance.  This  hit  or 
miss  philosophy,  although  distinctly  meaningful  in  the 
operational  environment,  is  inadequate  in  the  training 
sit- ration.  ^ <* 

The  case  for  outcome  measurements  can  be  stated  rather  simply.  In  an  operational 
environment,  commanders  are  more  interested  in  friendly /enemy  loss  ratios,  resources 
expended,  and  territory  won  or  lost.  The  attainment  of  some  set  of  predetermined 
mission-oriented  goals  among  these  dimensions  is  a much  more  meaningful  measure  of 
effectiveness  to  the  field  commanrler. 


' W.C.  Osborn.  Process  versus  product  measures  in  performance  testing.  Professional  Paper  16-74, 
Human  Resources  Research  Organization,  Alexandria,  Virginia,  October  1974.  (Based  on  paper  for 
Military  Testing  Association  Meeting,  San  Antonio,  Texas,  October  1973.) 

*T.J.  Hammell,  C.E.  Gasteyer,  and  A.J.  Pesch.  Advanced  officer  tactics  training  device  needs  and 
performance  measurement  technique  ■ Volume  I,  TRiNAV'TRAEQUIPCEN  72-C-0053-1,  General 
Dynamics  Corporation,  Electric  Boat  Division,  Groton,  Connecticut,  November  1973. 

^Ibid. 

^Italics  added  by  author. 
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Perhaps  you  are  wondering  why  1 bring  up  these  strategies  in  a pafjer  dealing  with 
problems.  The  situation  as  I see  it  is  this:  We  need  process  evaluations  for  feedback 
to  training  managers,  and  we  need  outcome  evaluations  to  meet  the  needs  of  field  com- 
manders. Yet,  it  is  difficult  to  obtain  process  information  from  a two-sided  test  and  even 
more  difficult  lo  obtain  outcome  information  of  the  kind  desired  by  commanders  from 
a one-sided  test.  It  is  difficult  enough  to  obtain  resources  for  even  one  type  of  test, 
much  less  two.  The  problem  is  in  finding  a way  to  combine  the  best  features  of  both 
types  of  tests  without  undue  expenditure  of  scarce  resources. 

Resources.  1 have  already  mentioned  the  resource  problem  in  passing  several  times. 
The  military  services  are  experiencing  one  of  the  longest  and  most  severe  periods  of 
austerity  in  their  recent  history.  Yet,  as  has  been  pointed  out,  adequate  evaluations 
are  quite  demanding  of  resources.  In  less  austere  times.  Baker  and  Cook'  painstakingly 
constructed  a “Temk  Platoon  Combat  Readiness  Check.”  The  final  checklist,  including 
instruction  to  the  examiner,  was  approximately  90  typewritten  pages  in  length.  The 
authors  also  pointed  out  that  the  entire  evaluation  took  approximately  30  hours  to 
administer  and  required  the  use  of  “aggressor”  forces.  At  the  present  time,  most  com- 
manders would  consider  the  resources  required  for  routine  conduct  of  such  an  evaluation 
to  be  out  of  the  question. 

It  seems  obvious  that  we  cannot  develop  adequate  evaluation  techniques  for  team 
performance  unless  additional  resources  can  be  found.  While  such  is  unlikely  an  absolute 
sense,  the  possibility  of  conserving  resources  for  evaluations  offers  some  hope.  Simula- 
tion techniques,  for  example,  are  being  employed  for  training  with  increasing  frequency 
and  with  little  apparent  loss  in  training  effectiveness.  For  example.  Powers,  McCli.skey, 
and  Haggard^  trained  four  groups  of  tank  gunners  employing  100%,  66%,  33%,  and  0% 
live-fire.  There  were  .ro  differences  between  the  hit  percentages  of  the  four  groups  in  a 
live-fire  posttraining  test.  Therefore,  it  appears  that  considerable  ammunition  could  have 
been  saved  with  no  loss  in  training  effectiveness. 

Whether  through  the  use  of  simulation  or  by  other  means,  it  is  our  opinion  that  the 
problem  is  not  whether  we  expend  the  resources,  but  rather,  how  we  obtain  the 
necessary  resources.  As  MG  Gorman  has  stated,  we  must  be  prepared  to  fight  outnumbered 
against  an  enemy  whose  weaponry  will  be  virtually  equal  to  ours.  To  do  so,  we  must 
be  able  to  accurately  evaluate  our  fighting  teams,  and  take  corrective  actions  to  eliminate 
£iny  deficiencies. 


' R.A.  Baker  and  J.G.  Cook.  The  development  and  evaluation  of  the  tank  platoon  combat  readi- 
ness check.  Research  Memorandum,  Human  Resources  Research  Organization,  Alexandria,  Virginia, 
April  1963. 

^T.R.  Powers,  M.R.  McCluskey,  and  D.F.  Haggard.  Determination  of  the  contribution  of  live 
firing  lo  weapons  proficiency.  Final  Report  FR-CD(C)-75-l,  Human  Resources  Research  Organization, 
Alexandria,  Virginia,  March  1975. 


